Semi-supervised Distributed Clustering with Mahalanobis Distance Metric Learning
نویسندگان
چکیده
Semi-supervised clustering uses a small amount of supervised information to aid unsupervised learning. As one of the semi-supervised clustering methods, metric learning has been widely used to clustering the centralized data points. However, there are many distributed data points, which cannot be centralized for the various reasons. Based on MPCK-MEANS framework [1] , the method of distributed Mahalanobis distance learning, called Distributed MPCK-MEANS, is proposed. Similar to MPCK-MEANS, Distributed MPCK-MEANS adopts EM iteration framework to performance clustering and metric learning simultaneously. To decrease the communication costs and protect the data security and privacy, corresponding transmission parameters are designed; furthermore, the transmission techniques are realized too. The proposed method is different to Parallel K-means, but is similar to Parallel K-means with some degrees. The proposed method is tested on artificial data sets and real data sets. Compared with Parallel K-means, Distributed MPCK-MEANS can effectively improve the quality of clustering.
منابع مشابه
Composite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملLearning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they ...
متن کاملSemi-Supervised Metric Learning Using Pairwise Constraints
Distance metric has an important role in many machine learning algorithms. Recently, metric learning for semi-supervised algorithms has received much attention. For semi-supervised clustering, usually a set of pairwise similarity and dissimilarity constraints is provided as supervisory information. Until now, various metric learning methods utilizing pairwise constraints have been proposed. The...
متن کاملAn investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means
The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on th...
متن کاملInformation-theoretic Semi-supervised Metric Learning via Entropy Regularization
We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JDCTA
دوره 4 شماره
صفحات -
تاریخ انتشار 2010